233 research outputs found
Online Learning with Gaussian Payoffs and Side Observations
We consider a sequential learning problem with Gaussian payoffs and side
information: after selecting an action , the learner receives information
about the payoff of every action in the form of Gaussian observations whose
mean is the same as the mean payoff, but the variance depends on the pair
(and may be infinite). The setup allows a more refined information
transfer from one action to another than previous partial monitoring setups,
including the recently introduced graph-structured feedback case. For the first
time in the literature, we provide non-asymptotic problem-dependent lower
bounds on the regret of any algorithm, which recover existing asymptotic
problem-dependent lower bounds and finite-time minimax lower bounds available
in the literature. We also provide algorithms that achieve the
problem-dependent lower bound (up to some universal constant factor) or the
minimax lower bounds (up to logarithmic factors)
- …